Dataset contains below columns
# Importing several helpful packages & useful analytics libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
# loading the csv file
df= pd.read_csv(r"C:\Users\purna\Desktop\Projects\Data Analysis\Mutual fund insights\comprehensive_mutual_funds_data.csv",na_values='-')
df.head()
| scheme_name | min_sip | min_lumpsum | expense_ratio | fund_size_cr | fund_age_yr | fund_manager | sortino | alpha | sd | beta | sharpe | risk_level | amc_name | rating | category | sub_category | returns_1yr | returns_3yr | returns_5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aditya Birla SL Active Debt Multi-Mgr FoF-Dir ... | 100 | 100 | 0.27 | 10.0 | 10 | Kaustubh Gupta | 0.32 | 2.24 | 9.39 | 0.01 | 0.24 | 3 | Aditya Birla Sun Life Mutual Fund | 3 | Other | FoFs Domestic | 4.0 | 6.5 | 6.9 |
| 1 | Aditya Birla SL Arbitrage Fund | 1000 | 1000 | 0.36 | 4288.0 | 10 | Lovelish Solanki | 1.33 | 1.53 | 0.72 | 0.56 | 1.10 | 1 | Aditya Birla Sun Life Mutual Fund | 3 | Hybrid | Arbitrage Mutual Funds | 5.6 | 4.8 | 5.5 |
| 2 | Aditya Birla SL Asset Allocator FoF-Dir Growth | 1000 | 1000 | 0.53 | 157.0 | 10 | Vinod Narayan Bhat | 3.44 | 2.67 | 10.58 | 0.67 | 1.42 | 5 | Aditya Birla Sun Life Mutual Fund | 3 | Other | FoFs Domestic | 2.0 | 18.9 | 9.7 |
| 3 | Aditya Birla SL Bal Bhavishya Yojna – Dir Growth | 500 | 1000 | 0.76 | 637.0 | 4 | Atul Penkar | 2.18 | -6.37 | 14.99 | 0.85 | 0.90 | 6 | Aditya Birla Sun Life Mutual Fund | 2 | Solution Oriented | Childrens Funds | -0.7 | 17.1 | NaN |
| 4 | Aditya Birla SL Balanced Advantage Fund | 100 | 100 | 0.61 | 6386.0 | 10 | Mohit Sharma | 3.69 | 1.99 | 10.38 | 0.68 | 1.39 | 6 | Aditya Birla Sun Life Mutual Fund | 4 | Hybrid | Dynamic Asset Allocation or Balanced Advantage | 4.5 | 18.6 | 9.7 |
# loading the csv file
df= pd.read_csv(r"C:\Users\purna\Desktop\Projects\Data Analysis\Mutual fund insights\comprehensive_mutual_funds_data.csv",na_values='-')
df.head()
| scheme_name | min_sip | min_lumpsum | expense_ratio | fund_size_cr | fund_age_yr | fund_manager | sortino | alpha | sd | beta | sharpe | risk_level | amc_name | rating | category | sub_category | returns_1yr | returns_3yr | returns_5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aditya Birla SL Active Debt Multi-Mgr FoF-Dir ... | 100 | 100 | 0.27 | 10.0 | 10 | Kaustubh Gupta | 0.32 | 2.24 | 9.39 | 0.01 | 0.24 | 3 | Aditya Birla Sun Life Mutual Fund | 3 | Other | FoFs Domestic | 4.0 | 6.5 | 6.9 |
| 1 | Aditya Birla SL Arbitrage Fund | 1000 | 1000 | 0.36 | 4288.0 | 10 | Lovelish Solanki | 1.33 | 1.53 | 0.72 | 0.56 | 1.10 | 1 | Aditya Birla Sun Life Mutual Fund | 3 | Hybrid | Arbitrage Mutual Funds | 5.6 | 4.8 | 5.5 |
| 2 | Aditya Birla SL Asset Allocator FoF-Dir Growth | 1000 | 1000 | 0.53 | 157.0 | 10 | Vinod Narayan Bhat | 3.44 | 2.67 | 10.58 | 0.67 | 1.42 | 5 | Aditya Birla Sun Life Mutual Fund | 3 | Other | FoFs Domestic | 2.0 | 18.9 | 9.7 |
| 3 | Aditya Birla SL Bal Bhavishya Yojna – Dir Growth | 500 | 1000 | 0.76 | 637.0 | 4 | Atul Penkar | 2.18 | -6.37 | 14.99 | 0.85 | 0.90 | 6 | Aditya Birla Sun Life Mutual Fund | 2 | Solution Oriented | Childrens Funds | -0.7 | 17.1 | NaN |
| 4 | Aditya Birla SL Balanced Advantage Fund | 100 | 100 | 0.61 | 6386.0 | 10 | Mohit Sharma | 3.69 | 1.99 | 10.38 | 0.68 | 1.39 | 6 | Aditya Birla Sun Life Mutual Fund | 4 | Hybrid | Dynamic Asset Allocation or Balanced Advantage | 4.5 | 18.6 | 9.7 |
df['risk_level'].replace({1:'Low',2:'Moderately Low',3:'Moderate',4:'Moderately High',5:'High',6:'Very High'},inplace=True)
# checking the size of data (rows,columns)
df.shape
(814, 20)
# checking for missing values and column datatypes are correct
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 814 entries, 0 to 813 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 scheme_name 814 non-null object 1 min_sip 814 non-null int64 2 min_lumpsum 814 non-null int64 3 expense_ratio 814 non-null float64 4 fund_size_cr 814 non-null float64 5 fund_age_yr 814 non-null int64 6 fund_manager 814 non-null object 7 sortino 791 non-null float64 8 alpha 772 non-null float64 9 sd 790 non-null float64 10 beta 772 non-null float64 11 sharpe 791 non-null float64 12 risk_level 814 non-null object 13 amc_name 814 non-null object 14 rating 814 non-null int64 15 category 814 non-null object 16 sub_category 814 non-null object 17 returns_1yr 814 non-null float64 18 returns_3yr 793 non-null float64 19 returns_5yr 647 non-null float64 dtypes: float64(10), int64(4), object(6) memory usage: 127.3+ KB
# checking for missing values
df.isnull().sum()
scheme_name 0 min_sip 0 min_lumpsum 0 expense_ratio 0 fund_size_cr 0 fund_age_yr 0 fund_manager 0 sortino 23 alpha 42 sd 24 beta 42 sharpe 23 risk_level 0 amc_name 0 rating 0 category 0 sub_category 0 returns_1yr 0 returns_3yr 21 returns_5yr 167 dtype: int64
We will try to find why sortino, alpha, sd,beta,share,returns_3yr and returns_5yr data is missing
# checking for duplicate rows if any
df.duplicated().sum()
0
# Lets investigate why there are missing values in returns_3yr and returns_5yr
df[df['returns_3yr'].isnull()]
| scheme_name | min_sip | min_lumpsum | expense_ratio | fund_size_cr | fund_age_yr | fund_manager | sortino | alpha | sd | beta | sharpe | risk_level | amc_name | rating | category | sub_category | returns_1yr | returns_3yr | returns_5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61 | AXIS Floater Fund | 1000 | 5000 | 0.16 | 416.0 | 2 | Aditya Pagaria | NaN | NaN | NaN | NaN | NaN | Moderately Low | Axis Mutual Fund | 2 | Debt | Floater Mutual Funds | 5.8 | NaN | NaN |
| 104 | Baroda BNP Paribas Banking & PSU Bond Fund | 500 | 5000 | 0.39 | 40.0 | 2 | Prashant Pimple | NaN | NaN | NaN | NaN | NaN | Moderately Low | Baroda BNP Paribas Mutual Fund | 2 | Debt | Banking and PSU Mutual Funds | 2.5 | NaN | NaN |
| 125 | BHARAT Bond FOF – April 2025 – Direct Growth | 500 | 1000 | 0.06 | 3912.0 | 3 | Dhawal Dalal | NaN | NaN | NaN | NaN | NaN | Moderately Low | Edelweiss Mutual Fund | 2 | Other | FoFs Domestic | 2.6 | NaN | NaN |
| 154 | DSP Floater Fund | 500 | 500 | 0.24 | 983.0 | 2 | Kedar Karnik | NaN | NaN | NaN | NaN | NaN | Moderately Low | DSP Mutual Fund | 1 | Debt | Floater Mutual Funds | 4.4 | NaN | NaN |
| 398 | Invesco India Medium Duration Fund | 1000 | 1000 | 0.40 | 306.0 | 2 | Vikas Garg | NaN | NaN | NaN | NaN | NaN | Moderate | Invesco Mutual Fund | 2 | Debt | Medium Duration Funds | 3.5 | NaN | NaN |
| 411 | ITI Banking & PSU Debt Fund | 500 | 5000 | 0.15 | 27.0 | 2 | Vikrant Mehta | NaN | NaN | NaN | NaN | NaN | Moderately Low | ITI Mutual Fund | 5 | Debt | Banking and PSU Mutual Funds | 5.4 | NaN | NaN |
| 412 | ITI Dynamic Bond Fund | 500 | 5000 | 0.14 | 57.0 | 2 | Vikrant Mehta | NaN | NaN | NaN | NaN | NaN | Moderately Low | ITI Mutual Fund | 5 | Debt | Dynamic Bond | 5.7 | NaN | NaN |
| 417 | ITI Ultra Short Duration Fund | 500 | 5000 | 0.10 | 119.0 | 2 | Vikrant Mehta | NaN | NaN | NaN | NaN | NaN | Moderately Low | ITI Mutual Fund | 2 | Debt | Ultra Short Duration Funds | 5.3 | NaN | NaN |
| 495 | Mahindra Manulife Short Duration Fund | 500 | 1000 | 0.29 | 43.0 | 2 | Rahul Pal | NaN | NaN | NaN | NaN | NaN | Moderate | Mahindra Manulife Mutual Fund | 3 | Debt | Short Duration Funds | 4.4 | NaN | NaN |
| 497 | Mirae Asset Banking and PSU Debt Fund | 1000 | 5000 | 0.34 | 89.0 | 3 | Basant Bafna | NaN | NaN | NaN | NaN | NaN | Moderately Low | Mirae Asset Mutual Fund | 3 | Debt | Banking and PSU Mutual Funds | 3.7 | NaN | NaN |
| 499 | Mirae Asset Corporate Bond Fund | 1000 | 5000 | 0.46 | 57.0 | 2 | Mahendra Jajoo | NaN | NaN | NaN | NaN | NaN | Moderately Low | Mirae Asset Mutual Fund | 3 | Debt | Corporate Bond Mutual Funds | 3.7 | NaN | NaN |
| 509 | Mirae Asset Money Market Fund | 1000 | 5000 | 0.25 | 59.0 | 2 | Basant Bafna | NaN | NaN | NaN | NaN | NaN | Moderately Low | Mirae Asset Mutual Fund | 3 | Debt | Money Market Funds | 5.3 | NaN | NaN |
| 514 | Mirae Asset Ultra Short Duration Fund | 1000 | 5000 | 0.29 | 284.0 | 2 | Basant Bafna | NaN | NaN | NaN | NaN | NaN | Moderately Low | Mirae Asset Mutual Fund | 3 | Debt | Ultra Short Duration Funds | 5.5 | NaN | NaN |
| 631 | SBI Floating Rate Debt Fund | 500 | 5000 | 0.26 | 1604.0 | 2 | Rajeev Radhakrishnan | NaN | NaN | NaN | NaN | NaN | Moderately Low | SBI Mutual Fund | 3 | Debt | Floater Mutual Funds | 5.0 | NaN | NaN |
| 689 | Sundaram Large Cap Fund | 100 | 100 | 0.60 | 2855.0 | 2 | Sudhir Kedia | 3.64 | -0.51 | 15.44 | 0.91 | 1.3 | Very High | Sundaram Mutual Fund | 5 | Equity | Large Cap Mutual Funds | 0.9 | NaN | NaN |
| 720 | Tata Floating Rate Fund | 150 | 5000 | 0.30 | 225.0 | 2 | Akhil Mittal | NaN | NaN | NaN | NaN | NaN | Moderately Low | Tata Mutual Fund | 4 | Debt | Floater Mutual Funds | 5.6 | NaN | NaN |
| 755 | TRUSTMF Banking & PSU Debt Fund | 1000 | 1000 | 0.21 | 345.0 | 2 | Anand Nevatia | NaN | NaN | NaN | NaN | NaN | Moderate | Trust Mutual Fund | 2 | Debt | Banking and PSU Mutual Funds | 3.0 | NaN | NaN |
| 756 | TRUSTMF Liquid Fund | 1000 | 1000 | 0.10 | 205.0 | 2 | Anand Nevatia | NaN | NaN | NaN | NaN | NaN | Moderately Low | Trust Mutual Fund | 2 | Debt | Liquid Mutual Funds | 5.6 | NaN | NaN |
| 757 | TRUSTMF Short Term Fund | 1000 | 1000 | 0.23 | 94.0 | 2 | Anand Nevatia | NaN | NaN | NaN | NaN | NaN | Moderately Low | Trust Mutual Fund | 2 | Debt | Short Duration Funds | 4.2 | NaN | NaN |
| 769 | Union Medium Duration Fund | 500 | 1000 | 0.64 | 170.0 | 3 | Parijat Agrawal | NaN | NaN | NaN | NaN | NaN | Moderate | Union Mutual Fund | 2 | Debt | Medium Duration Funds | 2.3 | NaN | NaN |
| 770 | Union Money Market Fund | 2000 | 5000 | 0.24 | 144.0 | 2 | Parijat Agrawal | NaN | NaN | NaN | NaN | NaN | Moderately Low | Union Mutual Fund | 3 | Debt | Money Market Funds | 5.4 | NaN | NaN |
We find that there are missing values in sortino, alpha,beta,sd,sharpe, returns_3yr and returns_5yr since fund age is less than 3 years It seems that sortino, alpha,beta,sd,sharpe ratios are calculated considering 3 years data
df[df['returns_5yr'].isnull()]
| scheme_name | min_sip | min_lumpsum | expense_ratio | fund_size_cr | fund_age_yr | fund_manager | sortino | alpha | sd | beta | sharpe | risk_level | amc_name | rating | category | sub_category | returns_1yr | returns_3yr | returns_5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Aditya Birla SL Bal Bhavishya Yojna – Dir Growth | 500 | 1000 | 0.76 | 637.0 | 4 | Atul Penkar | 2.18 | -6.37 | 14.99 | 0.85 | 0.90 | Very High | Aditya Birla Sun Life Mutual Fund | 2 | Solution Oriented | Childrens Funds | -0.7 | 17.1 | NaN |
| 38 | Aditya Birla SL Overnight Fund | 0 | 5000 | 0.07 | 13399.0 | 4 | Kaustubh Gupta | 0.11 | 0.58 | 0.35 | 0.43 | 0.02 | Low | Aditya Birla Sun Life Mutual Fund | 3 | Debt | Overnight Mutual Funds | 5.4 | 3.9 | NaN |
| 41 | Aditya Birla SL Retirement-The 30s Plan-Dir Gr... | 500 | 1000 | 1.06 | 258.0 | 4 | Dhaval Shah | 2.41 | -5.61 | 14.94 | 0.84 | 0.94 | Very High | Aditya Birla Sun Life Mutual Fund | 2 | Solution Oriented | Retirement Funds | 0.0 | 17.9 | NaN |
| 42 | Aditya Birla SL Retirement-The 40s Plan-Dir Gr... | 500 | 1000 | 1.07 | 91.0 | 4 | Dhaval Shah | 2.34 | -3.14 | 11.86 | 0.77 | 0.93 | Very High | Aditya Birla Sun Life Mutual Fund | 2 | Solution Oriented | Retirement Funds | -0.1 | 15.0 | NaN |
| 43 | Aditya Birla SL Retirement-The 50s Plan-Dir Gr... | 500 | 1000 | 0.63 | 27.0 | 4 | Dhaval Shah | 0.29 | -2.04 | 2.99 | 0.34 | 0.13 | Moderately High | Aditya Birla Sun Life Mutual Fund | 2 | Solution Oriented | Retirement Funds | 3.0 | 4.6 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 784 | UTI-Equity Savings Fund | 0 | 5000 | 0.68 | 257.0 | 5 | V Srivatsa | 4.06 | 3.98 | 5.99 | 0.91 | 1.75 | Moderate | UTI Mutual Fund | 4 | Hybrid | Equity Savings Mutual Funds | 6.1 | 14.7 | NaN |
| 786 | UTI-Floater Fund | 500 | 500 | 0.35 | 1637.0 | 4 | Sudhir Agrawal | 3.07 | 3.41 | 1.05 | 1.41 | 1.45 | Moderately Low | UTI Mutual Fund | 3 | Debt | Floater Mutual Funds | 5.1 | 5.9 | NaN |
| 801 | UTI-Nifty Next 50 Index Fund | 0 | 5000 | 0.33 | 1926.0 | 5 | Sharwan Kumar Goyal | 1.71 | -4.62 | 18.16 | 0.99 | 0.97 | Very High | UTI Mutual Fund | 2 | Other | Index Funds | -6.4 | 22.2 | NaN |
| 811 | WhiteOak Capital Liquid Fund | 500 | 500 | 0.26 | 197.0 | 4 | Piyush Baranwal | -0.42 | 0.93 | 0.39 | 0.72 | -0.07 | Low | WhiteOak Capital Mutual Fund | 1 | Debt | Liquid Mutual Funds | 5.3 | 3.8 | NaN |
| 813 | WhiteOak Capital Ultra Short Term Fund | 500 | 500 | 0.33 | 141.0 | 4 | Piyush Baranwal | 2.81 | 1.20 | 0.72 | 0.47 | 0.80 | Low | WhiteOak Capital Mutual Fund | 2 | Debt | Ultra Short Duration Funds | 5.3 | 4.5 | NaN |
167 rows × 20 columns
Clearly missing values in returns_5yr is there as fund age is not greater than 5 years
# mean returns across category
mean_returns_by_category = df.groupby('category')['returns_3yr','returns_5yr'].mean()
mean_returns_by_category
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2654559691.py:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
mean_returns_by_category = df.groupby('category')['returns_3yr','returns_5yr'].mean()
| returns_3yr | returns_5yr | |
|---|---|---|
| category | ||
| Debt | 6.228517 | 6.385446 |
| Equity | 29.738111 | 12.069767 |
| Hybrid | 15.272414 | 8.581053 |
| Other | 20.792405 | 10.710938 |
| Solution Oriented | 18.167857 | 9.747059 |
# mean performance ratios across category
mean_ratios_by_category = df.groupby('category')['sortino','alpha','sd','beta','sharpe'].mean()
mean_ratios_by_category
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\4022942247.py:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
mean_ratios_by_category = df.groupby('category')['sortino','alpha','sd','beta','sharpe'].mean()
| sortino | alpha | sd | beta | sharpe | |
|---|---|---|---|---|---|
| category | |||||
| Debt | 2.139885 | 2.160000 | 2.092107 | 0.807214 | 0.892366 |
| Equity | 3.110392 | 3.402550 | 16.959477 | 0.878642 | 1.459837 |
| Hybrid | 2.776552 | 2.486034 | 7.493190 | 0.707155 | 1.311638 |
| Other | 2.420633 | 0.513594 | 15.480506 | 0.850938 | 1.090253 |
| Solution Oriented | 2.589286 | 0.700357 | 10.272143 | 0.765714 | 1.204286 |
# imputing missing values with mean of category
df['returns_3yr'] = df.groupby('category')['returns_3yr'].apply(lambda x: x.fillna(x.mean()))
df['returns_5yr'] = df.groupby('category')['returns_5yr'].apply(lambda x: x.fillna(x.mean()))
df['sortino'] = df.groupby('category')['sortino'].apply(lambda x: x.fillna(x.mean()))
df['alpha'] = df.groupby('category')['alpha'].apply(lambda x: x.fillna(x.mean()))
df['sd'] = df.groupby('category')['sd'].apply(lambda x: x.fillna(x.mean()))
df['beta'] = df.groupby('category')['beta'].apply(lambda x: x.fillna(x.mean()))
df['sharpe'] = df.groupby('category')['sharpe'].apply(lambda x: x.fillna(x.mean()))
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2214251339.py:3: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
df['returns_3yr'] = df.groupby('category')['returns_3yr'].apply(lambda x: x.fillna(x.mean()))
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2214251339.py:4: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
df['returns_5yr'] = df.groupby('category')['returns_5yr'].apply(lambda x: x.fillna(x.mean()))
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2214251339.py:5: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
df['sortino'] = df.groupby('category')['sortino'].apply(lambda x: x.fillna(x.mean()))
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2214251339.py:6: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
df['alpha'] = df.groupby('category')['alpha'].apply(lambda x: x.fillna(x.mean()))
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2214251339.py:7: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
df['sd'] = df.groupby('category')['sd'].apply(lambda x: x.fillna(x.mean()))
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2214251339.py:8: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
df['beta'] = df.groupby('category')['beta'].apply(lambda x: x.fillna(x.mean()))
C:\Users\purna\AppData\Local\Temp\ipykernel_39816\2214251339.py:9: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
df['sharpe'] = df.groupby('category')['sharpe'].apply(lambda x: x.fillna(x.mean()))
# verifying missing values are imputed
df.isnull().sum()
scheme_name 0 min_sip 0 min_lumpsum 0 expense_ratio 0 fund_size_cr 0 fund_age_yr 0 fund_manager 0 sortino 0 alpha 0 sd 0 beta 0 sharpe 0 risk_level 0 amc_name 0 rating 0 category 0 sub_category 0 returns_1yr 0 returns_3yr 0 returns_5yr 0 dtype: int64
# checking statistics of numerical columns
df.describe()
| min_sip | min_lumpsum | expense_ratio | fund_size_cr | fund_age_yr | sortino | alpha | sd | beta | sharpe | rating | returns_1yr | returns_3yr | returns_5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 | 814.000000 |
| mean | 528.500000 | 3047.468059 | 0.713231 | 3812.854214 | 8.319410 | 2.640883 | 2.464598 | 10.084476 | 0.822852 | 1.197011 | 2.641278 | 3.921376 | 18.254880 | 9.389900 |
| std | 369.184517 | 2514.619513 | 0.482161 | 7181.479475 | 2.640622 | 1.374909 | 3.859215 | 7.808093 | 1.275605 | 0.531163 | 1.463476 | 6.675502 | 12.101639 | 3.497044 |
| min | 0.000000 | 0.000000 | 0.000000 | 2.380000 | 1.000000 | -1.770000 | -13.360000 | 0.270000 | -31.860000 | -0.820000 | 0.000000 | -19.700000 | 3.300000 | -4.100000 |
| 25% | 150.000000 | 500.000000 | 0.310000 | 175.250000 | 6.000000 | 1.810000 | 0.513594 | 2.092107 | 0.710000 | 0.894275 | 2.000000 | 1.500000 | 6.228517 | 6.389085 |
| 50% | 500.000000 | 5000.000000 | 0.615000 | 940.500000 | 10.000000 | 2.825000 | 1.885000 | 11.870000 | 0.850938 | 1.290000 | 3.000000 | 4.400000 | 18.050000 | 8.800000 |
| 75% | 1000.000000 | 5000.000000 | 1.030000 | 3819.000000 | 10.000000 | 3.460000 | 3.727500 | 16.410000 | 0.960000 | 1.540000 | 4.000000 | 5.600000 | 27.000000 | 12.069767 |
| max | 2000.000000 | 25000.000000 | 2.590000 | 57052.000000 | 17.000000 | 12.570000 | 27.240000 | 77.720000 | 5.400000 | 3.520000 | 5.000000 | 130.800000 | 71.400000 | 23.200000 |
There seems to be presence of outliers in columns min_lumpsum, fund_size_cr as mean and median differ significantly
df.head()
| scheme_name | min_sip | min_lumpsum | expense_ratio | fund_size_cr | fund_age_yr | fund_manager | sortino | alpha | sd | beta | sharpe | risk_level | amc_name | rating | category | sub_category | returns_1yr | returns_3yr | returns_5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aditya Birla SL Active Debt Multi-Mgr FoF-Dir ... | 100 | 100 | 0.27 | 10.0 | 10 | Kaustubh Gupta | 0.32 | 2.24 | 9.39 | 0.01 | 0.24 | Moderate | Aditya Birla Sun Life Mutual Fund | 3 | Other | FoFs Domestic | 4.0 | 6.5 | 6.900000 |
| 1 | Aditya Birla SL Arbitrage Fund | 1000 | 1000 | 0.36 | 4288.0 | 10 | Lovelish Solanki | 1.33 | 1.53 | 0.72 | 0.56 | 1.10 | Low | Aditya Birla Sun Life Mutual Fund | 3 | Hybrid | Arbitrage Mutual Funds | 5.6 | 4.8 | 5.500000 |
| 2 | Aditya Birla SL Asset Allocator FoF-Dir Growth | 1000 | 1000 | 0.53 | 157.0 | 10 | Vinod Narayan Bhat | 3.44 | 2.67 | 10.58 | 0.67 | 1.42 | High | Aditya Birla Sun Life Mutual Fund | 3 | Other | FoFs Domestic | 2.0 | 18.9 | 9.700000 |
| 3 | Aditya Birla SL Bal Bhavishya Yojna – Dir Growth | 500 | 1000 | 0.76 | 637.0 | 4 | Atul Penkar | 2.18 | -6.37 | 14.99 | 0.85 | 0.90 | Very High | Aditya Birla Sun Life Mutual Fund | 2 | Solution Oriented | Childrens Funds | -0.7 | 17.1 | 9.747059 |
| 4 | Aditya Birla SL Balanced Advantage Fund | 100 | 100 | 0.61 | 6386.0 | 10 | Mohit Sharma | 3.69 | 1.99 | 10.38 | 0.68 | 1.39 | Very High | Aditya Birla Sun Life Mutual Fund | 4 | Hybrid | Dynamic Asset Allocation or Balanced Advantage | 4.5 | 18.6 | 9.700000 |
# fund manager with highest returns in 5 years
df.groupby(['fund_manager'])['returns_5yr'].mean().sort_values(ascending=False).nlargest(10).plot(kind='barh')
plt.title('Top 10 fund manager by 5 years returns')
plt.show()
# creating category wise dataframes
debt_df = df[df['category'] == 'Debt']
equity_df = df[df['category'] == 'Equity']
hybrid_df = df[df['category'] == 'Hybrid']
other_df = df[df['category'] == 'Other']
solution_df = df[df['category'] == 'Solution Oriented']
# calculating top 5 fund manager with avg highest returns in each category
top_5_debt = debt_df.groupby('fund_manager')['returns_5yr'].mean().sort_values(ascending=False).head().reset_index()
top_5_equity = equity_df.groupby('fund_manager')['returns_5yr'].mean().sort_values(ascending=False).head().reset_index()
top_5_hybrid = hybrid_df.groupby('fund_manager')['returns_5yr'].mean().sort_values(ascending=False).head().reset_index()
top_5_solution = solution_df.groupby('fund_manager')['returns_5yr'].mean().sort_values(ascending=False).head().reset_index()
top_5_other = other_df.groupby('fund_manager')['returns_5yr'].mean().sort_values(ascending=False).head().reset_index()
# create the plot
fig, ax = plt.subplots(nrows=1,ncols=5,figsize=(14, 4),sharey='row')
sns.barplot(x='fund_manager', y='returns_5yr', data=top_5_debt, ax=ax[0],color='#F1DC2F')
sns.barplot(x='fund_manager', y='returns_5yr', data=top_5_equity, ax=ax[1],color='#F1512F')
sns.barplot(x='fund_manager', y='returns_5yr', data=top_5_hybrid, ax=ax[2],color='#27AF17')
sns.barplot(x='fund_manager', y='returns_5yr', data=top_5_solution, ax=ax[3],color='#17AFA3')
sns.barplot(x='fund_manager', y='returns_5yr', data=top_5_other, ax=ax[4],color='#AF178A')
# set plot title and x labels
ax[0].set_title('Debt Funds Returns')
ax[0].set_xticklabels(top_5_debt['fund_manager'],rotation='vertical')
ax[1].set_title('Equity Funds Returns')
ax[1].set_xticklabels(top_5_equity['fund_manager'],rotation='vertical')
ax[2].set_title('Hybrid Funds Returns')
ax[2].set_xticklabels(top_5_hybrid['fund_manager'],rotation='vertical')
ax[3].set_title('Solution Funds Returns')
ax[3].set_xticklabels(top_5_solution['fund_manager'],rotation='vertical')
ax[4].set_title('Other Funds Returns')
ax[4].set_xticklabels(top_5_other['fund_manager'],rotation='vertical')
# show the plot
plt.tight_layout()
plt.suptitle('Top 5 fund manager in each category by average 5 year of returns')
plt.subplots_adjust(top=0.8)
plt.show()
Conclusion
# top 10 fund manager by the number of schemes
df['fund_manager'].value_counts().nlargest(10).plot(kind='barh')
plt.xlabel('Number of schemes')
plt.title('Top 10 fund manager by No of schemes')
plt.show()
Conclusion
# top 10 fund manager by the mean age of funds
df.groupby(['fund_manager'])['fund_age_yr'].mean().sort_values(ascending=False).nlargest(10).plot(kind='barh')
plt.xlabel('Age of funds in year')
plt.title('Top 10 fund manager by oldest schemes')
plt.show()
Conclusion
# top 10 fund manager by the assets under management
df.groupby(['fund_manager'])['fund_size_cr'].mean().sort_values(ascending=False).nlargest(10).plot(kind='barh')
plt.xlabel('Fund value in Crore')
plt.title('Top 10 fund manager having highest assets under management')
plt.show()
Conclusion
# distribution of min_sip
fig, ax = plt.subplots(nrows=1,ncols=2,figsize=(14, 4))
sns.histplot(df['min_sip'],ax=ax[0],kde=True)
sns.boxplot(df['min_sip'],ax=ax[1])
<Axes: >
Conclusion
# distribution of min_lumpsum
fig, ax = plt.subplots(nrows=1,ncols=2,figsize=(14, 4))
sns.histplot(df['min_lumpsum'],ax=ax[0],kde=True)
sns.boxplot(df['min_lumpsum'],ax=ax[1])
<Axes: >
Conclusion
# distribution of min_sip and min_lumpsum by category type
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(12, 8))
sns.boxplot(data = df,y=df['category'],x=df['min_lumpsum'],ax=axes[0])
sns.boxplot(data = df,y=df['category'],x=df['min_sip'],ax=axes[1])
axes[0].set_title('Lumpsum investment required by Category type')
axes[1].set_title('SIP investment required by Category type')
plt.subplots_adjust(bottom=4,top=5)
plt.show()
Conclusion
# distribution of min_sip,min_lumpsum by sub_category type
fig, axes = plt.subplots(nrows=5, ncols=2, figsize=(12, 16),sharey='row')
sns.boxplot(equity_df,y='sub_category',x='min_lumpsum',ax=axes[0,0])
sns.boxplot(hybrid_df,y='sub_category',x='min_lumpsum',ax=axes[1,0])
sns.boxplot(solution_df,y='sub_category',x='min_lumpsum',ax=axes[2,0])
sns.boxplot(other_df,y='sub_category',x='min_lumpsum',ax=axes[3,0])
sns.boxplot(debt_df,y='sub_category',x='min_lumpsum',ax=axes[4,0])
axes[0,0].set_title('Minimum Lumpsum Investment of Equity fund by sub-Category')
axes[1,0].set_title('Minimum Lumpsum Investment of Hybrid fund by sub-Category')
axes[2,0].set_title('Minimum Lumpsum Investment of Solution Oriented fund by sub-Category')
axes[3,0].set_title('Minimum Lumpsum Investment of Other fund by sub-Category')
axes[4,0].set_title('Minimum Lumpsum Investment of Debt fund by sub-Category')
sns.boxplot(equity_df,y='sub_category',x='min_sip',ax=axes[0,1])
sns.boxplot(hybrid_df,y='sub_category',x='min_sip',ax=axes[1,1])
sns.boxplot(solution_df,y='sub_category',x='min_sip',ax=axes[2,1])
sns.boxplot(other_df,y='sub_category',x='min_sip',ax=axes[3,1])
sns.boxplot(debt_df,y='sub_category',x='min_sip',ax=axes[4,1])
axes[0,1].set_title('Minimum SIP Investment of Equity fund by sub-Category')
axes[1,1].set_title('Minimum SIP Investment of Hybrid fund by sub-Category')
axes[2,1].set_title('Minimum SIP Investment of Solution Oriented fund by sub-Category')
axes[3,1].set_title('Minimum SIP Investment of Other fund by sub-Category')
axes[4,1].set_title('Minimum SIP Investment of Debt fund by sub-Category')
plt.subplots_adjust(bottom=2.5,top=4,left=2,right=3)
plt.show()
# create a function to make a category columns with above fund size
def fund_size_group(fund_size):
if fund_size >= 0.0 and fund_size <= 500.0:
return '0-500'
elif fund_size > 500.0 and fund_size <= 750.0:
return '500-750'
elif fund_size > 750.0 and fund_size <= 2000.0:
return '750-2000'
elif fund_size > 2000.0 and fund_size <= 5000.0:
return '2000-5000'
elif fund_size > 5000.0 and fund_size <= 10000.0:
return '5000-10000'
elif fund_size > 10000.0 and fund_size <= 50000.0:
return '10000-50000'
else:
return '>50000'
# create a column with above fund size group
df['fund_size_group'] = df['fund_size_cr'].apply(fund_size_group)
# expense ratio variation with fund size group across different category
# Create the categorical plot with subplots
g = sns.catplot(data=df, x='fund_size_group', y='expense_ratio', kind='bar', col='category', col_wrap=3)
# Set the rotation of the x-axis tick labels for all subplots
g.set_xticklabels(rotation='vertical')
# set the super title
plt.suptitle('Expense Ratio variation across different Fund size group and category',fontsize=20)
plt.subplots_adjust(top=0.8)
# Show the plot
plt.show()
Findings:
Large fund size does have low expense ratio.
Debt has the lowest 'expense ratio' amongs the funds.
# expense ratio distribution
fig, ax = plt.subplots(nrows=1,ncols=2,figsize=(14, 4))
sns.histplot(df['expense_ratio'],ax=ax[0],kde=True)
sns.boxplot(df['expense_ratio'],ax=ax[1])
<Axes: >
Conclusion
# distribution of expense ratio by category type
plt.figure(figsize=(10,8))
sns.boxplot(df,y=df['category'],x=df['expense_ratio'])
plt.show()
Conclusion
# distribution of expense ratio by sub-category type
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(10, 16))
sns.boxplot(equity_df,y='sub_category',x='expense_ratio',ax=axes[0])
sns.boxplot(hybrid_df,y='sub_category',x='expense_ratio',ax=axes[1])
sns.boxplot(solution_df,y='sub_category',x='expense_ratio',ax=axes[2])
sns.boxplot(other_df,y='sub_category',x='expense_ratio',ax=axes[3])
sns.boxplot(debt_df,y='sub_category',x='expense_ratio',ax=axes[4])
axes[0].set_title('Expense Ratio of Equity fund by sub-Category type')
axes[1].set_title('Expense Ratio of Hybrid fund by sub-Category type')
axes[2].set_title('Expense Ratio of Solution Oriented fund by sub-Category type')
axes[3].set_title('Expense Ratio of Other fund by sub-Category type')
axes[4].set_title('Expense Ratio of Debt fund by sub-Category type')
plt.subplots_adjust(bottom=2.5,top=4)
plt.show()
Conclusion
Equity type -
Hybrid type-
Solution Oriented -
Other Fund-
Debt Fund-
# expense ratio vs rating
sns.boxplot(df,x='rating',y='expense_ratio')
plt.title('Variation of expense ratio across rating')
plt.show()
Findings -
# Lets find why 0 rating fund has high expense ratio
pd.crosstab(df['rating'],df['category'],margins=True)
| category | Debt | Equity | Hybrid | Other | Solution Oriented | All |
|---|---|---|---|---|---|---|
| rating | ||||||
| 0 | 4 | 78 | 2 | 19 | 5 | 108 |
| 1 | 22 | 18 | 10 | 4 | 4 | 58 |
| 2 | 66 | 54 | 25 | 14 | 9 | 168 |
| 3 | 96 | 76 | 39 | 27 | 5 | 243 |
| 4 | 64 | 52 | 27 | 12 | 3 | 158 |
| 5 | 30 | 30 | 13 | 4 | 2 | 79 |
| All | 282 | 308 | 116 | 80 | 28 | 814 |
Findings
# expense ratio vs risk level
sns.boxplot(df,x='risk_level',y='expense_ratio')
plt.title('Variation of expense ratio across risk level')
plt.xticks(rotation='vertical')
plt.show()
Conclusion
# Lets find why very high risk fund has high expense ratio
pd.crosstab(df['risk_level'],df['category'],margins=True)
| category | Debt | Equity | Hybrid | Other | Solution Oriented | All |
|---|---|---|---|---|---|---|
| risk_level | ||||||
| High | 3 | 0 | 7 | 15 | 2 | 27 |
| Low | 32 | 0 | 21 | 0 | 0 | 53 |
| Moderate | 109 | 0 | 9 | 4 | 2 | 124 |
| Moderately High | 16 | 8 | 29 | 6 | 6 | 65 |
| Moderately Low | 121 | 1 | 4 | 1 | 0 | 127 |
| Very High | 1 | 299 | 46 | 54 | 18 | 418 |
| All | 282 | 308 | 116 | 80 | 28 | 814 |
Findings
# distribution of fund size
sns.histplot(df['fund_size_cr'])
plt.title('Fund Size Distribution')
plt.show()
Findings
# fund size across fund age
#plt.figure(figsize=(12,8))
sns.boxplot(df,x='fund_size_cr',y='fund_age_yr',orient='h')
plt.title('Fund size variations with fund age')
plt.show()
Conclusion
# fund size vs rating
#plt.figure(figsize=(12,8))
sns.boxplot(df,x='fund_size_cr',y='rating',orient='h')
plt.title('Fund size variation with the rating')
plt.show()
Conclusion
# fund size vs risk level
#plt.figure(figsize=(12,8))
sns.boxplot(df[df['fund_size_cr']<10000],x='fund_size_cr',y='risk_level',orient='h')
plt.title('Fund size variations with Risk level')
plt.show()
Conclusion
# top 10 AMC by sum of funds managed
df.groupby('amc_name')['fund_size_cr'].sum().sort_values(ascending=False).head(10).plot(kind='barh')
plt.ylabel('Total Funds managed in Crore')
plt.title('Funds managed by AMC across all schemes')
plt.show()
Conlusion:
# category wise fund
plt.figure(figsize=(10,8))
sns.boxplot(df,y='category',x='fund_size_cr')
plt.title('Funds variation across category')
plt.show()
Conlusion
# sub category wise fund
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(10, 8),sharex='col')
sns.boxplot(equity_df,y='sub_category',x='fund_size_cr',ax=axes[0])
sns.boxplot(hybrid_df,y='sub_category',x='fund_size_cr',ax=axes[1])
sns.boxplot(solution_df,y='sub_category',x='fund_size_cr',ax=axes[2])
sns.boxplot(other_df,y='sub_category',x='fund_size_cr',ax=axes[3])
sns.boxplot(debt_df,y='sub_category',x='fund_size_cr',ax=axes[4])
axes[0].set_title('Fund size of Equity fund by sub-Category type')
axes[1].set_title('Fund size of Hybrid fund by sub-Category type')
axes[2].set_title('Fund size of Solution Oriented fund by sub-Category type')
axes[3].set_title('Fund size of Other fund by sub-Category type')
axes[4].set_title('Fund size of Debt fund by sub-Category type')
plt.subplots_adjust(bottom=10, top=12)
plt.show()
Conclusion
Equity-
Hybrid-
Debt-
# schemes with high fund also has high returns (schemes giving higher returns will lead to funds inflow)
sns.catplot(df,y='fund_size_group',x='returns_5yr',col='category',col_wrap=3,kind='bar')
plt.subplots_adjust(top=0.8)
plt.suptitle('Returns variations with Fund size',fontsize=15)
plt.show()
Conclusion
# distribution of fund age
sns.histplot(df['fund_age_yr'])
plt.title('Distribution of Fund Age')
plt.show()
sns.histplot(df,x='fund_age_yr',hue='category',kde=True)
plt.title('Distribution of Fund Age')
plt.show()
Findings
# can we say that new funds give better returns ?
sns.catplot(df,x='returns_5yr',y='fund_age_yr',orient='h',kind='box',col='category',col_wrap=3)
plt.subplots_adjust(top=0.8)
plt.suptitle('Returns variations with Fund age',fontsize=15)
plt.show()
Conclusion
# total no of schemes in each risk level
sns.countplot(df,x='risk_level')
plt.title('Number of funds across Risk level')
plt.ylabel('Number of funds')
plt.xticks(rotation='vertical')
plt.show()
Conclusion
# amc and risk level
plt.figure(figsize=(14,10))
sns.heatmap(pd.crosstab(df['amc_name'],df['risk_level'],normalize='index'),annot=True,fmt='.1%')
plt.title('AMC wise schemes distribution across risk level in percentage')
plt.show()
Conclusion
# category and risk level
sns.heatmap(pd.crosstab(df['category'],df['risk_level'],normalize='index'),annot=True,fmt='.1%')
plt.title('Schemes distribution across category and risk level')
plt.show()
Findings
# rating and risk level
sns.heatmap(pd.crosstab(df['rating'],df['risk_level'],normalize='index'),annot=True,fmt='.1%')
plt.title('Schemes distribution across rating and risk level')
plt.show()
Findings -
sns.heatmap(pd.pivot_table(data=df,index='rating',columns='risk_level',values='returns_5yr'),annot=True,fmt='.3g')
plt.title('Variations of 5 years returns across rating and risk level')
plt.show()
Findings -
# high risk level means high returns ??
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a bar plot for each axis
sns.barplot(df,x='risk_level',y='returns_1yr',ax=axes[0])
sns.barplot(df,x='risk_level',y='returns_3yr',ax=axes[1])
sns.barplot(df,x='risk_level',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year')
axes[0].set_xticklabels(rotation='vertical',labels=df['risk_level'].unique())
axes[1].set_title('Returns over 3 year')
axes[1].set_xticklabels(rotation='vertical',labels=df['risk_level'].unique())
axes[2].set_title('Returns over 5 year')
axes[2].set_xticklabels(rotation='vertical',labels=df['risk_level'].unique())
plt.tight_layout()
plt.show()
Findings -
# distribution of amc
plt.figure(figsize=(10,8))
sns.countplot(df,y='amc_name')
plt.title('Number of schemes vs AMC')
plt.xlabel('Number of schemes')
plt.show()
Findings- ICICI has around 57 schemes running
# avg rating of amc funds
plt.figure(figsize=(10,8))
sns.barplot(df,y='amc_name',x='rating')
plt.title('Avg rating across AMC')
plt.xlabel('Avg rating')
plt.show()
Findings - Quant Mutual Funds have the highest rating.
# amc name vs category
plt.figure(figsize=(10,8))
sns.heatmap(pd.crosstab(df['amc_name'],df['category'],normalize='index'),annot=True,fmt='.1%')
plt.title('Percent of schemes by AMC across category')
plt.show()
Findings -
plt.figure(figsize=(10,8))
sns.heatmap(pd.pivot_table(df,index='amc_name',columns='category',values='returns_5yr'),annot=True,fmt='.2g')
plt.title('Avg of 5 years returns by AMC across category')
plt.show()
Findings-
# amc name vs sub-category
plt.figure(figsize=(22,20))
sns.heatmap(pd.crosstab(df['amc_name'],df['sub_category']),annot=True)
plt.title('Number of schemes by AMC across sub-category')
plt.show()
Findings
plt.figure(figsize=(14,10))
sns.heatmap(pd.pivot_table(df,index='amc_name',columns='sub_category',values='returns_5yr'),annot=True,fmt='.2g')
plt.title('Avg 5 years of Returns given by AMC across different sub-category')
plt.show()
Findings - Quant Mutual Funds have performed very well across different Sub categories.
df['rating'].value_counts().plot(kind='bar')
plt.title('Distribution of Rating')
plt.show()
# does returns vary because of ratings ??
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a bar plot for each axis
sns.barplot(df,x='rating',y='returns_1yr',ax=axes[0])
sns.barplot(df,x='rating',y='returns_3yr',ax=axes[1])
sns.barplot(df,x='rating',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year')
axes[1].set_title('Returns over 3 year')
axes[2].set_title('Returns over 5 year')
plt.tight_layout()
plt.show()
Findings -
# lets check why 0 rating is giving higher returns
sns.heatmap(pd.crosstab(df['rating'],df['category'],normalize='index'),annot=True,fmt='.1%')
<Axes: xlabel='category', ylabel='rating'>
Findings
plt.figure(figsize=(14,10))
sns.heatmap(pd.pivot_table(df,index='rating',columns='category',values='returns_5yr'),annot=True,fmt='.2g')
plt.title('Avg 5 years of Returns given by Rating and Category')
plt.show()
Findings
sns.barplot(df,x='category',y='rating')
plt.title('Avg Rating across category')
plt.show()
# distribution of category
sns.countplot(df,x='category')
plt.title('Number of schemes across category')
plt.ylabel('Number of schemes')
plt.show()
Findings - Among category most of the schemes are Equity or Debt type
sns.barplot(df,x='category',y='returns_5yr')
plt.title('Avg 5 years returns across category')
plt.ylabel('Average of 5 years returns')
plt.show()
Findings - Equity has given highest return
# sub category vs category
plt.figure(figsize=(8,10))
sns.heatmap(pd.crosstab(df['sub_category'],df['category']),annot=True,fmt='g')
plt.title('Number of schemes across sub-category by category ')
plt.show()
Findings:
# category wise returns
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 4),sharey='row')
# Create a bar plot for each axis
sns.barplot(df,x='category',y='returns_1yr',ax=axes[0])
sns.barplot(df,x='category',y='returns_3yr',ax=axes[1])
sns.barplot(df,x='category',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year')
axes[0].set_xticklabels(df['category'].unique(),rotation='vertical')
axes[1].set_title('Returns over 3 year')
axes[1].set_xticklabels(df['category'].unique(),rotation='vertical')
axes[2].set_title('Returns over 5 year')
axes[2].set_xticklabels(df['category'].unique(),rotation='vertical')
plt.show()
Findings
# distribution of sub category
plt.figure(figsize=(10,8))
sns.countplot(df,y='sub_category')
plt.title('Number of schemes by sub-category')
plt.xlabel('Number of Schemes')
plt.show()
Findings
# returns of debt category and various sub category over 1 year, 3 year, 5 year
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a bar plot for each axis
sns.barplot(debt_df,x='sub_category',y='returns_1yr',ax=axes[0])
sns.barplot(debt_df,x='sub_category',y='returns_3yr',ax=axes[1])
sns.barplot(debt_df,x='sub_category',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year for debt funds')
axes[0].set_xticklabels(debt_df['sub_category'].unique(),rotation='vertical')
axes[1].set_title('Returns over 3 year for debt funds')
axes[1].set_xticklabels(debt_df['sub_category'].unique(),rotation='vertical')
axes[2].set_title('Returns over 5 year for debt funds')
axes[2].set_xticklabels(debt_df['sub_category'].unique(),rotation='vertical')
plt.tight_layout()
plt.show()
Findings
# returns of equity category and various sub category over 1 year, 3 year, 5 year
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a bar plot for each axis
sns.barplot(equity_df,x='sub_category',y='returns_1yr',ax=axes[0])
sns.barplot(equity_df,x='sub_category',y='returns_3yr',ax=axes[1])
sns.barplot(equity_df,x='sub_category',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year for equity fund')
axes[0].set_xticklabels(equity_df['sub_category'].unique(),rotation='vertical')
axes[1].set_title('Returns over 3 year for equity fund')
axes[1].set_xticklabels(equity_df['sub_category'].unique(),rotation='vertical')
axes[2].set_title('Returns over 5 year for equity fund')
axes[2].set_xticklabels(equity_df['sub_category'].unique(),rotation='vertical')
plt.tight_layout()
plt.show()
Findings
# returns of hybrid category and various sub category over 1 year, 3 year, 5 year
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a bar plot for each axis
sns.barplot(hybrid_df,x='sub_category',y='returns_1yr',ax=axes[0])
sns.barplot(hybrid_df,x='sub_category',y='returns_3yr',ax=axes[1])
sns.barplot(hybrid_df,x='sub_category',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year for Hybrid Fund')
axes[0].set_xticklabels(hybrid_df['sub_category'].unique(),rotation='vertical')
axes[1].set_title('Returns over 3 year for Hybrid Fund')
axes[1].set_xticklabels(hybrid_df['sub_category'].unique(),rotation='vertical')
axes[2].set_title('Returns over 5 year for Hybrid Fund')
axes[2].set_xticklabels(hybrid_df['sub_category'].unique(),rotation='vertical')
plt.tight_layout()
plt.show()
Findings
# returns of other category and various sub category over 1 year, 3 year, 5 year
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a bar plot for each axis
sns.barplot(other_df,x='sub_category',y='returns_1yr',ax=axes[0])
sns.barplot(other_df,x='sub_category',y='returns_3yr',ax=axes[1])
sns.barplot(other_df,x='sub_category',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year for Other Funds')
axes[1].set_title('Returns over 3 year for Other Funds')
axes[2].set_title('Returns over 5 year for Other Funds')
plt.tight_layout()
plt.show()
Findings
# returns of solution oriented category and various sub category over 1 year, 3 year, 5 year
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a bar plot for each axis
sns.barplot(solution_df,x='sub_category',y='returns_1yr',ax=axes[0])
sns.barplot(solution_df,x='sub_category',y='returns_3yr',ax=axes[1])
sns.barplot(solution_df,x='sub_category',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year for Solution Funds')
axes[1].set_title('Returns over 3 year for Solution Funds')
axes[2].set_title('Returns over 5 year for Solution Funds')
plt.tight_layout()
plt.show()
Findings
Alpha is defined as the excess return of the mutual fund over the benchmark return, on a risk-adjusted basis.
Risk-adjusted basis means we need to –
Calculate the difference between the mutual fund returns and the risk-free return. Calculate the difference between the benchmark return and the risk-free return, multiply this by the beta. Take the difference between 1 and 2
Mathematically,
Alpha = (MF Return – riskfree return) – (Benchmark return – riskfree return)*Beta
Lets put this in context with an example. Assume a certain fund gives you a return of 10%, its benchmark returns for the same duration is 7%. The beta of the fund is 0.75. What do you think the alpha assuming the risk-free rate is 4%?
Let’s apply the for formula and check –
Alpha = (10%-4%)-(7%-4%)*0.75 = 6% – 2.25% = 3.75% or 0.0375
As you can see, the alpha is not just the difference between the fund and its benchmark, which if true, the alpha would have been – 10% – 7% =3%
But rather, the alpha is 3.75%.
The fund has managed to generate a 10% return compared to the Index’s 7% while managing to stay significantly less volatile (remember beta is just 0.75). Hence we are rewarding the fund for its good behaviour or less volatile behaviour. Therefore the alpha is 3.75% as opposed to just 3%.
Now, imagine the same fund, with the same returns, but the beta is 1.3 instead of 0.75. What do you think is the alpha? By now, you should guess that since the beta is high, the fund gets penalised for its erratic behaviour. Therefore the alpha should be lower.
Let us see if the numbers agree to this thought.
Alpha = (10%-4%)-(7%-4%)*1.3 = 6% – 3.9% = 2.1%
See that? While the returns remain the same, thanks to beta, the alpha is significantly lesser on a risk-adjusted basis.
To conclude, alpha is the excess return of the fund over above the benchmark returns. Alpha is a risk-adjusted. The fund is rewarded if the returns are generated by keeping a low-risk profile and penalized for being volatile.
By now, you must have realized that volatility plays an important role in measuring mutual funds performance. Beta is a measure of volatility; it tells us how risky the fund is when compared to its benchmark. Beta is a relative risk and does not reveal the fund’s inherent risk.
The inherent risk of a fund is revealed by the ‘Standard Deviation’ of the fund.
sns.histplot(df['alpha'])
plt.title('Distribution of Alpha')
plt.show()
# high alpha value has higher returns
# Create figure and axis objects
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12, 6),sharey='row')
# Create a scatter plot for each axis
sns.scatterplot(df,x='alpha',y='returns_1yr',ax=axes[0])
sns.scatterplot(df,x='alpha',y='returns_3yr',ax=axes[1])
sns.scatterplot(df,x='alpha',y='returns_5yr',ax=axes[2])
# Set the title for each axis
axes[0].set_title('Returns over 1 year')
axes[1].set_title('Returns over 3 year')
axes[2].set_title('Returns over 5 year')
plt.tight_layout()
plt.show()
Findings
# alpha vs category
sns.barplot(df,x='category',y='alpha')
plt.title('Alpha variation across category')
plt.show()
Findings
# alpha vs risk
sns.barplot(df,y='risk_level',x='alpha')
plt.title('Alpha variation across risk')
plt.show()
Findings
# alpha vs rating
sns.barplot(df,x='rating',y='alpha')
plt.title('Alpha variation across category')
plt.show()
Finding
sns.pairplot(df,vars=['alpha','sd','sortino','beta','sharpe'],kind='reg')
<seaborn.axisgrid.PairGrid at 0x1e4e0cb75b0>
Findings
The standard deviation of a stock or a mutual fund represents the riskiness of the stock or the mutual fund. Higher the standard Deviation, higher is the volatility of the asset. Higher the volatility, higher is the risk.
For example consider these 2 funds - The SD of the small-cap fund is 23.95% while the long term equity is 19.33%, which implies that the small-cap fund is way riskier compared to the long term equity fund.
To put this context, if you invest Rs.10,000/- across funds at the same time, then by the end of the year the profit or loss can be anywhere in this range –
Loss = Investment * (1-SD)
Gains = Investment * (1+SD)
The larger the SD, the larger the possibility of loss or gains.
Generally speaking, the SD for mid and small-cap funds are higher compared to large-cap stocks.
Do note, volatility or Standard Deviation should not worry you. Markets are volatile, and equities are volatile, mutual funds are volatile; this is the very nature of markets. So if you can’t fathom watching your investment see-saw between gains and loss, then perhaps you should reconsider your investment decision in equities.
But if you do invest in equities, then you need to learn to manage volatility. There are two ways to deal with this beast called ‘Volatility’ –
sns.histplot(df['sd'])
plt.title('Distribution of SD')
plt.show()
# sd vs category
sns.barplot(df,x='category',y='sd')
plt.title('Standard deviation variation across category')
plt.show()
Findings
# sd vs sub category
plt.figure(figsize=(10,8))
sns.barplot(df,y='sub_category',x='sd')
plt.title('Standard deviation variation across sub-category')
plt.show()
Findings
# sd vs risk
sns.barplot(df,y='risk_level',x='sd')
plt.title('Standard deviation variation across Risk Level')
plt.show()
Findings
# sd vs rating
sns.barplot(df,x='rating',y='sd')
plt.title('Standard deviation variation across rating')
plt.show()
Findings
Beta = Covariance between mutual fund and benchmark / Variance of benchmark index
A beta of 1 indicates that the mutual fund has the same level of volatility as the benchmark index. A beta of less than 1 indicates lower volatility, while a beta of greater than 1 indicates higher volatility.
For example, the Quant Small Cap Fund has a beta of 0.92, hence the fund is slightly less risky compared to its benchmark. I say slightly because it’s very close to 1. This implies, if Nifty Small Cap 250 Index falls by 1%, then Quant Small Cap Fund is expected to fall by 0.92%.
If the beta was 0.6 or 0.65, the fund is less risk or less volatile compared to its benchmark. Why? Because if Nifty Small Cap 250 Index falls by 1%, then Quant Small Cap Fund is expected to fall by only 0.65% and not 0.95%.
This is what I mean by ‘relative risk’; it gives us a perspective of how risky the fund is compared to its benchmark.
When you are looking at the Beta of a stock or an MF, it is very important to recognize the fact that the beta is a measure of relative risk, it tells us how risky the stock or MF is compared to its benchmark. Beta is not an indicator of the inherent risk of the stock or MF.
To put this in context, think about it this way, Ferrari is faster compared to a BMW, this comparison is like the beta. We measure the speed of car one against car two. But does this give you any indication of how fast the Ferrari is? Not really.
Likewise, while beta gives us a perspective of the relative riskiness of an asset, it does not give us the absolute or the inherent risk of the asset itself.
By now, you must have built your perception of beta. Let me ask you this – if a mutual fund has a high beta, do you think it is bad?
Well, the good, bad, ugly part of beta depends on another metric called the ‘Alpha’.
To know more about Mutual Fund Metrics - refer to Zerodha varsity
sns.histplot(df['beta'])
plt.title('Distribution of Beta')
plt.show()
# category vs beta
sns.boxplot(df,x='category',y='beta')
plt.ylim(-5,5)
plt.title('Beta variation across category')
plt.show()
Findings
# equity mutual funds has high beta
sns.boxplot(equity_df,x='sub_category',y='beta',)
plt.xticks(rotation='vertical')
plt.ylim(0,2)
plt.title('Beta variation across Equity sub-category')
plt.show()
# debt mutual funds has high beta variations
sns.boxplot(debt_df,x='sub_category',y='beta')
plt.xticks(rotation='vertical')
plt.ylim(-5,7)
plt.title('Beta variation across Debt sub-category')
plt.show()
# beta vs risk
sns.barplot(df,y='risk_level',x='beta')
plt.title('Beta variation across Risk Level')
plt.show()
# beta vs rating
sns.barplot(df,x='rating',y='beta')
plt.title('Beta variation across Rating')
plt.show()
Assume, there are two large-cap funds -Fund A and Fund B. Here is how they have performed in terms of returns –
Fund A – 14%
Fund B – 16%
Which of the two funds are better? Well, Fund B has a higher return, so without a doubt, Fund B is a better fund.
Now, consider the following –
Rf is the risk-free return. Along with the fund’s return, I’ve also stated the standard deviation/volatility/risk of the two funds. Now, which of the two funds do you think is better?
I guess it gets a little complex to figure out which these two funds are better given that we have to evaluate them on two parameters, i.e. both the risk and return.
Ignoring the risk, purely on a return basis, Fund B is better. Ignoring the return, purely on a risk basis, Fund A is better. But in reality, you cannot isolate risk and reward; you need to factor in both these and figure out which of these two are better.
The Sharpe Ratio helps us here. It bundles the concept of risk, reward, and the risk-free rate and gives us a perspective.
Sharpe ratio = [Fund Return – Risk-Free Return]/Standard Deviation of the fund
Lets apply the math for Fund A –
= [14% – 6%] / 28%
= 8%/28%
= 0.29
The number tells us that the fund generates 0.29 units of return (over and above the risk-free return) for every unit of risk undertaken.Naturally, by this measure, the higher the Sharpe ratio, the better it is as we all want higher returns for every unit of risk undertaken.
Lets see how this turns out for Fund B –
= [16% – 6% ] / 34%
= 10% / 34%
= 0.29
So it turns out that both the funds are similar in terms of their risk and reward perspective. And there is no advantage of choosing Fund A over Fund B.
Do note, Sharpe ratio considers only price based risk. It does not consider credit or interest rate risk. Hence, there is no point looking at the Sharpe ratio for debt funds.
sns.histplot(df['sharpe'])
plt.title('Distribution of Sharpe')
plt.show()
# category vs sharpe
sns.barplot(df,x='category',y='sharpe')
plt.title('Variation of Sharpe ratio across category')
plt.show()
Findings
# sharpe vs risk
sns.barplot(df,y='risk_level',x='sharpe')
plt.title('Sharpe variation across risk level')
plt.show()
Findings
# sharpe vs rating
sns.barplot(df,x='rating',y='sharpe')
plt.title('Sharpe variation across rating')
plt.show()
Findings
The Sortino’s ratio is an improvisation over the Sharpe Ratio, wherein the denominator has only the negative returns or the ‘downside risk’, is considered.
Hence, the Sortino’s Ratio is –
= [Fund Return – Risk-Free Return]/Downside Risk
The objective of Sortino’s ratio is to estimate the excess return adjusted for only the downside risk. Like the Sharpe ratio, higher the Sortino’s ratio, better it is.
sns.histplot(df['sortino'])
plt.title('Distribution of Sortino')
plt.show()
# category vs sortino
sns.barplot(df,y='category',x='sortino')
plt.title('Variation of Sortino ratio across category')
plt.show()
Findings
# sortino vs risk level
sns.barplot(df,y='risk_level',x='sortino')
plt.title('Sortino variation across risk level')
plt.show()
Findings
# sortino vs rating
sns.barplot(df,x='rating',y='sortino')
plt.title('Sortino variation across rating')
plt.show()
Finding:
sns.histplot(df[df['returns_1yr']<30],x='returns_1yr',hue='category',kde=True)
plt.title('Distribution of 1 year returns across category')
plt.show()
Findings
sns.histplot(df,x='returns_3yr',hue='category',kde=True)
plt.title('Distribution of 3 year returns across category')
plt.show()
Findings
sns.histplot(df,x='returns_5yr',hue='category',kde=True)
plt.title('Distribution of 5 year returns across category')
plt.show()
Findings
# top schemes in each category and sub category in terms of returns earned
category_idx = df.groupby("category")["returns_5yr"].idxmax()
category_result = df.loc[category_idx, ["category", "scheme_name", "returns_5yr"]]
category_result
| category | scheme_name | returns_5yr | |
|---|---|---|---|
| 63 | Debt | AXIS FTP – Series 104 – 1112Days | 14.0 |
| 608 | Equity | Quant Small Cap Fund | 23.2 |
| 607 | Hybrid | Quant Multi Asset Fund | 20.5 |
| 456 | Other | Kotak Multi Asset Allocator FoF – Dynamic – Di... | 15.3 |
| 252 | Solution Oriented | HDFC Retirement Savings Fund | 14.8 |
sub_cat_idx = df.groupby("sub_category")["returns_5yr"].idxmax()
sub_catresult = df.loc[sub_cat_idx, ["sub_category", "scheme_name", "returns_5yr"]]
sub_catresult
| sub_category | scheme_name | returns_5yr | |
|---|---|---|---|
| 599 | Aggressive Hybrid Mutual Funds | Quant Absolute Fund | 19.100000 |
| 86 | Arbitrage Mutual Funds | Bank of India Arbitrage Fund | 8.581053 |
| 182 | Banking and PSU Mutual Funds | Edelweiss Banking & Psu Debt Fund | 7.900000 |
| 226 | Childrens Funds | HDFC Children’s Gift Fund | 12.900000 |
| 436 | Conservative Hybrid Mutual Funds | Kotak Debt Hybrid Fund | 10.000000 |
| 623 | Contra Funds | SBI Contra Fund | 15.600000 |
| 267 | Corporate Bond Mutual Funds | HSBC Corporate Bond Fund | 13.300000 |
| 306 | Credit Risk Funds | ICICI Pru Credit Risk Fund | 8.100000 |
| 753 | Dividend Yield Funds | Templeton India Equity Income Fund | 14.500000 |
| 223 | Dynamic Asset Allocation or Balanced Advantage | HDFC Balanced Advantage Fund | 12.900000 |
| 275 | Dynamic Bond | HSBC Flexi Debt Fund | 13.300000 |
| 609 | ELSS Mutual Funds | Quant Tax Plan- Direct Growth | 22.200000 |
| 487 | Equity Savings Mutual Funds | Mahindra Manulife Equity Savings Fund | 13.300000 |
| 63 | Fixed Maturity Plans | AXIS FTP – Series 104 – 1112Days | 14.000000 |
| 601 | Flexi Cap Funds | Quant Flexi Cap Fund | 18.200000 |
| 313 | Floater Mutual Funds | ICICI Pru Floating Interest Fund | 7.200000 |
| 456 | FoFs Domestic | Kotak Multi Asset Allocator FoF – Dynamic – Di... | 15.300000 |
| 178 | FoFs Overseas | DSP World Mining Fund | 15.000000 |
| 371 | Focused Funds | IIFL Focused Equity Fund | 18.100000 |
| 157 | Gilt Mutual Funds | DSP G-Sec Fund | 8.600000 |
| 556 | Index Funds | Nippon India Index Fund | 13.000000 |
| 501 | Large & Mid Cap Funds | Mirae Asset Emerging Bluechip Fund | 15.100000 |
| 126 | Large Cap Mutual Funds | Canara Robeco Bluechip Equity Fund | 14.400000 |
| 413 | Liquid Mutual Funds | ITI Liquid Fund | 6.385446 |
| 32 | Low Duration Funds | Aditya Birla SL Low Duration Fund | 7.000000 |
| 34 | Medium Duration Funds | Aditya Birla SL Medium Term Plan – Direct Growth | 8.600000 |
| 555 | Medium to Long Duration Funds | Nippon India Income Fund | 7.900000 |
| 606 | Mid Cap Mutual Funds | Quant Mid Cap Fund | 19.800000 |
| 193 | Money Market Funds | Edelweiss Money Market Fund | 7.300000 |
| 607 | Multi Asset Allocation Mutual Funds | Quant Multi Asset Fund | 20.500000 |
| 600 | Multi Cap Funds | Quant Active Fund | 19.900000 |
| 812 | Overnight Mutual Funds | WhiteOak Capital Overnight Fund | 13.300000 |
| 252 | Retirement Funds | HDFC Retirement Savings Fund | 14.800000 |
| 715 | Sectoral / Thematic Mutual Funds | Tata Digital India Fund | 22.100000 |
| 345 | Short Duration Funds | ICICI Pru Short Term Fund | 7.700000 |
| 608 | Small Cap Mutual Funds | Quant Small Cap Fund | 23.200000 |
| 538 | Ultra Short Duration Funds | Navi Ultra Short Term Fund | 13.300000 |
| 351 | Value Funds | ICICI Pru Value Discovery Fund | 14.900000 |
This notebook is for Educational purpose only and one should check with there advisor before investing in mutual funds.